23 research outputs found
An Identity for Kernel Ridge Regression
This paper derives an identity connecting the square loss of ridge regression
in on-line mode with the loss of the retrospectively best regressor. Some
corollaries about the properties of the cumulative loss of on-line ridge
regression are also obtained.Comment: 35 pages; extended version of ALT 2010 paper (Proceedings of ALT
2010, LNCS 6331, Springer, 2010
Prediction with expert advice for the Brier game
We show that the Brier game of prediction is mixable and find the optimal
learning rate and substitution function for it. The resulting prediction
algorithm is applied to predict results of football and tennis matches. The
theoretical performance guarantee turns out to be rather tight on these data
sets, especially in the case of the more extensive tennis data.Comment: 34 pages, 22 figures, 2 tables. The conference version (8 pages) is
published in the ICML 2008 Proceeding
Competing with Gaussian linear experts
We study the problem of online regression. We prove a theoretical bound on
the square loss of Ridge Regression. We do not make any assumptions about input
vectors or outcomes. We also show that Bayesian Ridge Regression can be thought
of as an online algorithm competing with all the Gaussian linear experts
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL
Exploiting Invariance in Training Deep Neural Networks
Inspired by two basic mechanisms in animal visual systems, we introduce a
feature transform technique that imposes invariance properties in the training
of deep neural networks. The resulting algorithm requires less parameter
tuning, trains well with an initial learning rate 1.0, and easily generalizes
to different tasks. We enforce scale invariance with local statistics in the
data to align similar samples at diverse scales. To accelerate convergence, we
enforce a GL(n)-invariance property with global statistics extracted from a
batch such that the gradient descent solution should remain invariant under
basis change. Profiling analysis shows our proposed modifications takes 5% of
the computations of the underlying convolution layer. Tested on convolutional
networks and transformer networks, our proposed technique requires fewer
iterations to train, surpasses all baselines by a large margin, seamlessly
works on both small and large batch size training, and applies to different
computer vision and language tasks